interface-ip: overwrite resolv.conf file instead of renaming#62
interface-ip: overwrite resolv.conf file instead of renaming#62dbnk0 wants to merge 1 commit intoopenwrt:masterfrom
Conversation
Currently, the change tracking of resolv.conf works as follows: 1. Create a temporary resolv.conf with current entries. 2. Calculate CRC32 of both the temporary and the existing (old) resolv.conf file. 3. If the CRC checksum differs, the temporary file is renamed to replace the old resolv.conf. However, this causes issues with ujail environments where the jail has bind-mounted the resolv.conf from the host. In this case, the new file would not be visible inside the jail until the jail was restarted. This is particularly troublesome for jailed ntpd deamon running on a device without a local DNS server, relying on a DNS info supplied from DHCP (e.g. dumb AP). Very often, the DHCP lease comes later than the ntpd is started, causing time synchronization to fail until the process is manually restarted. The proposed solution modifies the step 3 in the resolv.conf change tracking above: 3. If the CRC checksum differs, the existing resolv.conf is overwritten in place with the new (current) entries (ones from the temporary file). This approach, however, has a drawback - rename is atomic while overwrite is not. To mitigate this issue, a file locking mechanism is added (`flock`). Fixes: - openwrt/openwrt#10389 - openwrt/openwrt#10843 Signed-off-by: Dávid Benko <davidbenko@davidbenko.dev>
|
wont you get more flash wear is you overwrite, instead of rename? |
|
|
Kind ping @dangowrt, @nbd168 or @systemcrash for review. I backported this patch to OpenWrt 24.10.5, deployed it to multiple ramips and filogic APs and recorded no more issues with time synchronization ever since. |
|
Ping @nbd168, @systemcrash once again, I think you are the most appropriate reviewers currently. |
| if (crcold != crcnew) { | ||
| /* Lock the file */ | ||
| if (flock(fileno(f), LOCK_EX | LOCK_NB) < 0) { | ||
| netifd_log_message(L_WARNING, "Failed to exclusively lock " |
There was a problem hiding this comment.
Failed to lock exclusively
There was a problem hiding this comment.
Do you want me to reword the message? It continues on the next line: Failed to exclusively lock resolv.conf (<PATH>): <ERROR MESSAGE>
Failed to lock exclusively resolv.conf (<PATH>): <ERROR MESSAGE> sounds a bit odd to me.
|
Are there scenarios where other agents write to resolv? Like with different whitespace formatting somehow? Not that I think that's an important edge-case to handle. |
None that I know of. I added the file locking as a safeguard mechanism, just in case. |
Currently, the change tracking of resolv.conf works as follows:
However, this causes issues with ujail environments where the jail has bind-mounted the resolv.conf from the host. In this case, the new file would not be visible inside the jail until the jail was restarted.
This is particularly troublesome for jailed ntpd deamon running on a device without a local DNS server, relying on a DNS info supplied from DHCP (e.g. dumb AP). Very often, the DHCP lease comes later than the ntpd is started, causing time synchronization to fail until the process is manually restarted.
The proposed solution modifies the step 3 in the resolv.conf change tracking above:
This approach, however, has a drawback - rename is atomic while overwrite is not. To mitigate the issue, a file locking mechanism is added (
flock).The modified implementation includes more comments and restructured log messages. Major failures regarding writing to the resolv.conf file are now on the
WARNsyslog level.Tested on todays (2025-12-17) snapshot x86/64 build.
Fixes long standing issues in OpenWrt main repository:
As a side note, there are definitely alternative solutions/workarounds for the problem described in the issues above. The simplest one would be to disable ujail for ntpd altogether. This has some security implications, but should be OK for NTP client (and has been done as a workaround in other projects FreifunkFranken/firmware@f48e53c). What about server or client+server tho? Another possibility would be to watch for changes of
/etc/resolv.conf(or/tmp/resolv.conf.d/resolv.conf.autoreally) and restart the ujail when needed. This can be (probably) implemented in the ujail itself (inotify API?) or externally. Internal watchdog could be OK, but possibly a lot of work. External watchdog is not a clean solution, but it's exactly what some users are doing. Ujail could also mount the whole/etcfolder instead of/etc/resolv.conf, but this also has security implications (leaks of unrelated credentials). The list can go on; however, I consider overwriting resolv.conf instead of rename replacement as the most viable solution to the ntpd on dumb AP problem.